79 research outputs found

    Sorting suffixes of two-pattern strings

    Get PDF
    Recently, several authors presented linear recursive algorithms for sorting suffixes of a string. All these algorithms employ a similar three-step approach, based on an initial division of the suffixes of x into two sets: in step 1 sort the first set using recursive reduction of the problem, in step 2 determine the order of the suffixes in the second set based on the order of the suffixes in the first set, and in step 3 merge the two sets together. To optimize such an algorithm either for space or time, it may not be sufficient to optimize one of the three steps, since in doing so, one might increase the resources required for the others to an unacceptable extent. Franek, Lu, and Smyth introduced two-pattern strings as a generalization of Sturmian strings. Like Sturmian strings, two-pattern strings are generated by iterated morphisms, but they exhibit a much richer structure. In this paper we show that the suffixes of two-pattern strings can be sorted in linear time using a variant of the three step approach outlined above. It turns out that, given the order of the suffixes in a two-pattern string, one can almost directly list in linear time all the suffixes of its expansion under a two-pattern morphism

    On baier's sort of maximal Lyndon substrings

    Get PDF
    We describe and analyze in terms of Lyndon words an elementary sort of maximal Lyndon factors of a string and prove formally its correctness. Since the sort is based on the first phase of Baier’s algorithm for sorting of the suffixes of a string, we refer to it as Baier’s sort

    Verifying a border array in linear time

    Get PDF
    A border of a string x is a proper (but possibly empty) prefix of x that is also a suffix of x. The border array β = β[1..n] of a string x = x[1..n] is an array of nonnegative integers in which each element β[i], 1 ≤ i ≤ n, is the length of the longest border of x[1..i]. In this paper we first present a simple linear-time algorithm to determine whether or not a given array y = y[1..n] of integers is a border array of some string on an alphabet of unbounded size. We state as an open problem the design of a corresponding and equally efficient algorithm on an alphabet of bounded size α. We then consider the problem of generating all possible distinct border arrays of given length n on a bounded or unbounded alphabet, and doing so in time proportional to the number of arrays generated. A previously published algorithm that claims to solve this problem in constant time per array generated is shown to be incorrect, and new algorithms are proposed. We state as open the design of an equally efficient on-line algorithm for this problem

    Two-Pattern strings

    Get PDF
    This paper introduces a new class of strings on {a, b}, called two-pattern strings, that constitute a substantial generalization of Sturmian strings while at the same time sharing many of their nice properties. In particular, we show that, in common with Sturmian strings, only time linear in the string length is required to recognize a two-pattern string as well as to compute all of its repetitions. We also show that two-pattern strings occur in some sense frequently in the class of all strings on {a,b}

    Specific Effects of Synthetic Oligopeptides in Animal Cell Culture

    No full text

    The new periodicity lemma revisited

    Get PDF
    In 2006, the New Periodicity Lemma (NPL) was published, showing that the occurrence of two squares starting at a position ii in a string necessarily precludes the occurrence of other squares of specified period in a specified neighbourhood of ii. The proof of this lemma was complex, breaking down into 14 subcases, and requiring that the shorter of the two squares be regular. In this paper we significantly relax the conditions required by the NPL and removing the need for regularity altogether, and we establish a more precise result using a simpler proof based on lemmas that expose new combinatorial structures in a string, in particular a canonical factorization for any two squares that start at the same position

    Isolation and some molecular characteristics of pig γ1-macroglobulin

    No full text

    The role of the prefix array in sequence analysis: A survey

    Get PDF
    The prefix array was apparently first computed and used algorithmically in 1984, playing a pivotal role in an optimal algorithm to determine all the tandem repeats in a given (DNA or protein) sequence. However, it is especially since the turn of the 21st century that applications of the prefix array to fundamental sequencing problems have been recognized. An important aspect of this expanding role has been the recognition that the prefix table and the border array are “equivalent” data structures 一 that is, one can be computed from the other in linear time. Since the border array in turn specifies all the periods of every prefix of the sequence, the prefix array thus turns out to be a structure of central importance. In this paper we survey important applications of the prefix array 一 in particular to approximate string matching under Hamming distance, as well as to the computation of covers and enhanced covers 一 and show how, unlike border array algorithms, these are extendible to sequences containing “don’t-care” or indeterminate letters such as {a, c} or {g, t}. This extension leads to a surprising correspondence between prefix arrays and undirected graphs that seems likely to be a fertile source of new insights in future. We conclude with an overview of sequencing problems that the authors believe can be handled using prefix array technology
    corecore